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ABSTRACT 


Tests  of  hypotheses  for  the  parameters  in  a  general  linear  model 
are  considered  based  on  weighted  rank  statistics.  Results  are  presented 
for  tests  based  on  a  rank  estimate,  tests  based  on  drop  in  dispersion 
and  aligned  rank  tests.  Weights  can  be  used  to  focus  the  analysis  on 
simple  effects  and  provide  an  additional  degree  of  robustness  to  rank 
tests.  Several  analysis  of  variance  applications  are  discussed. 


Key  Words  and  Phrases:  analysis  of  variance,  dispersion  function,  robust 

tests,  aligned  ranks,  rank  estimates,  weights 


1.  INTRODUCTION 


Statistical  procedures  based  on  ranks  are  widely  used  for  simple 
linear  model  problems.  For  the  general  linear  model  there  has  also  been 
considerable  development  in  the  area  of  rank  methods.  The  results  show 
that  they  are  nearly  as  efficient  as  the  classical,  least-squares  methods 
when  normal  distributions  hold  and  more  efficient  for  many  other  distribu¬ 
tions.  The  least-squares  methods  can  be  inefficient  for  non-normal  dis¬ 
tributions  (for  both  large  and  small  sample  sizes)  and  are  sensitive  to 
outliers  and  high  leverage  points,  while  the  rank  methods  are  more  robust. 

This  paper  will  consider  weighted  rank  statistics  for  linear  model 
problems.  Weights  are  usually  introduced  in  statistical  methods  to 
increase  efficiency  but  that  is  not  the  case  here.  Instead,  the  interest 
is  in  using  weights  so  as  to  not  lose  efficiency  while  gaining  in  other 
respects,  in  particular,  in  gaining  a  further  degree  of  robustness.  This 
is  discussed  further  in  section  6. 

The  emphasis  will  be  on  analysis  of  variance  problems  and  in  this 
case  weights  of  zero  or  one  are  of  special  concern.  They  have  the  effect 
of  restricting  the  ranking  to  various  subsets  of  the  data  instead  of 
ranking  the  entire  set.  A  familiar  example  is  the  within  block  ranking 
used  for  Friedman's  test.  It  should  be  noted  that  the  large  sample 
results  considered  here  would  apply  in  block  designs  with  within  block 
comparisons  as  the  number  of  observations  per  block  grows  large  with  a 
fixed  number  of  blocks  and  they  would  not  apply  in  the  reverse  case  of 
fixed  block  size  with  the  number  of  blocks  growing  large. 


<’  < 


Ia  one-way  and  higher  order  analysis  of  variance  problems  there  are 


many  rank  tests  in  popular  use  that  are  based  on  restricted  ranking, 
restricted  comparison  methods.  For  example,  in  block  designs  the  ranking 
may  be  done  separately  in  each  block  with  no  comparison  between  observa¬ 
tions  in  different  blocks.  In  testing  the  effects  of  several  treatments 
against  a  control,  the  treatment  groups  may  be  compared  only  to  the 
control  group  and  not  to  each  other.  In  testing  the  equality  of  several 
groups  against  an  ordered  alternative  Tryon  and  Hettmansperger  (1973) 
discussed  the  value  of  using  only  comparisons  between  adjacent  groups. 

The  methodology  of  this  paper  includes  the  types  of  restricted  comparisons 
above  as  special  cases. 

Results  on  the  asymptotic  distributions  of  estimates  and  test 
statistics  that  have  appeared  separately  in  the  literature  can  be  obtained 
in  a  unified  framework  from  Remarks  2.1  and  2.3.  This  common  view  of 
many  diverse  problems  can  be  valuable  in  promoting  understanding  of  their 
common  structure  and  in  suggesting  new  solutions  to  other  problems.  The 
general  results  may  provide  the  details  necessary  to  modify  a  standard 
procedure  to  better  fit  a  particular  application.  A  computer  program 
based  on  the  general  approach  would  be  able  to  handle  a  wide  range  of 
problems. 

Consider  the  linear  model 


(1.1) 

Y  - 

eoi  + 

X  8  +  e  , 

where 

Y  - 

«i . V 

is  an 

n  x  1 

random  vector, 

1 

is  an  n  x  1 

vector 

with 

each  element 

equal 

to  one, 

X  -  (x^)  is 

an 

n  x  p  design 

matrix,  g  ■  (gj,...,g  )'  is  a  p  x  1  vector  of  parameters  and 


e  *  (e. )'  is  an  n  x  1  vector  of  random  errors.  Assume  that  X 
-In 

is  centered  so  that  its  column  sums  are  zero.  Assume  that  the  errors  are 
independent  with  a  common  distribution  having  density  function  f  .  The 
residuals  are  denoted  by  Z  ■  (Z^,...,Zn)'  where 

Z  =  Z(b)  =  Y  -  Xb  . 

A  weighted  rank  estimate  of  6  is  reviewed  in  section  2  along  with 
some  theoretical  results  that  will  be  needed  in  the  rest  of  the  paper. 
Longer  proofs  are  delayed  until  the  Appendix.  Sections  3,  4  and  5  discuss 
tests  of  hypotheses  based  on  the  estimate  6  ,  on  the  drop  in  dispersion 
and  on  aligned  ranks,  respectively.  Some  advantages  in  the  use  of 
weights  are  considered  in  section  6  and  applications  to  a  variety  of 
analysis  of  variance  problems  are  discussed  in  section  7. 
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2.  PRELIMINARIES 


Consider  the  dispersion  function  of  the  residuals 


(2.1) 


D  =  D(b)  =  l  w  |Z  -  2  |  , 
i<j  J  J 


where  the  w„  2.®*  1  .5.  i  <  j  £  n  are  a  given  set  of  weights.  The 


weights  should  reflect  the  importance  of  the  comparisons.  They  may 
depend  on  the  design  matrix  X  .  Some  of  the  weights  can  be  zero  to 
drop  some  pairwise  comparisons  from  consideration.  The  special  case  of 

1  ,  gives  rise  to  Gini's  mean  difference. 
Hettmansperger  and  McKean  (1978a)  have  shown  that  in  this  case  the 
dispersion  function  is  equivalent  to  Jaeckel's  dispersion  function  with 
Wilcoxon  scores. 

The  dispersion  function  D  can  be  expressed  in  another  form.  Let 


equal  weights,  w_ 


(R  ,...,Rn)  denote  the  ranks  of  the  residuals;  that  is,  R^  is  the  rank 


of  Z^  in  the  set  {Z^, . . .  ,Z^ }  ,  1  <_  i  ^  n  .  Let  sgn(v)  =  +1  ,  0  ,  -1 


as  v  is  >0,=0,  <0 


Extend  the  definition  of  the  weights  w. . 

ij 


to  all  subscripts  i,  j=l,...,n  by  using  w  =  w^j  an<*  =  ®  • 


Then,  using  |v|  =  v  sgn(v)  ,  some  manipulation  shows 

n 


(2.2) 


I  B,Z-  > 

L .  ,  i  l 


i=l 


with  *  B^b)  “  Ijwijsgn(zi  “  Zj )  ,  i  =  1  ,  .  .  .  ,  n  .  The  coefficients 
are  random  with  B^  depending  on  the  rank  of  and  also  on  the 

subscripts  of  the  residuals  that  are  less  than  Z.  .  In  the  special  case 
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I 


The  estimate  g  of  the  parameter  §  will  be  a  point  in  the 


parameter  space  minimizing  the  dispersion  function  (see  Sievers  (1983)). 


The  partial  derivatives  of  D  are  (approximately)  equal  to  zero  at  the 


minimum.  Using  (2.2),  these  derivatives  are 


n 

(2.3)  3D/3bk  =  -  l 

i=l 


for  k  =  1  ,  .  .  .  ,  p  ,  except  at  a  finite  number  of  points.  Letting 


a..(k)  =  w..(x..  -x..  )  ,  another  form  of  the  derivatives  is 
ij  ij  jk  lk 


(2.4)  3D/3b  =-2£  a.  (k)<j>(Z  ,Z  )  +  £  a  (k)  , 

i<j  3  3  i<j  J 

where  <t>(u,v)  =  0  ,  i  ,  1  as  u>v,  u  =  v,  u<v.  A  matrix  form  is 


(2.5)  3D/3b  =  -X'B  , 
where  B  *  (B^)  is  n  x  1  . 

For  the  results  to  follow  it  is  convenient  to  use  a  multiple  of  the 

derivative.  Define  a  random  vector  U(b)  =  (U, (b),...U  (b))'  by 

1  ~  P  ~ 

(2.6)  U(b)  =  (l/2)n"3/2X'B  . 

Note  that  \(b)  =  n  3/2  tl^a^  (k) 4>(Zi ,Z^  )  -  Ii<jaij  (k) /2 ]  . 

Some  constants  will  also  be  needed.  For  k  =  1  ,  .  .  .  ,  p  ,  let 


a.  (k)  *  l  a  (k) 

j=i+l  3 

for 

a.j(k)  “  ^13ij(k) 

for 

j  8  2  ,  ,  .  .  ,  n  , 

a  (k)  -  0  ,  a  (k)  ■  0  , 
n»  -1 

a  (k)  »  £  a  (k) 
i<j  3 

and 

At(k)  -  a  ^(k)  ~  ai  (k)  . 

-6- 


For  asymptotic  purposes,  a  sequence  of  these  constants  is  needed,  indexed 

on  n  =  1  ,  2  ,  .  .  .  ,  but  this  dependence  on  n  for  these  and  other 

quantities  will  sometimes  be  suppressed  in  the  notation. 

Let  be  an  n  x  n  symmetric  matrix  involving  the  weights  of  the 

dispersion  function.  Specifically,  define  the  (i,j)C^  element  of 

t  h 

to  be  -w^j  if  i  <  j  and  -w  ^  if  i  >  j  .  The  l  diagonal 

element  of  W  is  w.  =  y.,.v..  .  Thus  W  has  the  negatives  of  the 
-n  l  ij  ~n  & 

dispersion  function  weights  for  its  off-diagonal  elements  and  positive 
diagonal  elements  determined  so  that  the  row  and  column  sums  are  zero. 
Also  write 


V  =  X’W  W  X  . 
~n  -  -n~n- 

C  =  X’W  X  . 

-n  *  -n- 


ASSUMPTION 

JAjI 

:  For  each  k  =  1  ,  .  .  . 

,  P 

i 

n  «  2 

y  A.(k)/  max  A  (k)  ® 

"i=l  1  l_<i<n  1 

as  n  -*■  <*>  . 

ASSUMPTION 

(A,) 

:  For  each  k  *  1  ,  .  .  . 

,  P 

1 

1  a2  (k)/f  A^(k)  -*■  0 

i<j  J  i=l 

as  n  •+■  ”  . 

ASSUMPTION 

<a3) 

:  For  each  k  =  1  ,  .  .  . 

>  P 

2 

y  a^ ,  (k)/(”)  is  bounded  as  n  ■+•  «  . 
i<j  -1 


1 
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ASSUMPTION  (A4)  :  For  each  k  =  1  ,  .  .  .  ,  p 

max  |  |/*^n-*-0asn-*-°o.  i 

l_<i_<n 

ASSUMPTION  (Ar)  : 

- 

(l/n)X'  X I  as  n  -*■  <=°  , 
where  I  is  a  p  x  p  ,  positive  definite  matrix. 

ASSUMPTION  (Ar )  : 

- : - 6— 

-3 

n  V  -*■  V  as  n  , 

~n 

where  V  is  a  p  x  p  ,  positive  definite  matrix. 


ASSUMPTION  (AJ  : 

-2_ 

n  C  -*•  C  as  n 

-n  ~  * 

where  C  is  a  p  x  p  ,  nonsingular  matrix. 

Let  G(y)  =  P(e  -e^y)  denote  the  cdf  of  the  difference  of 
independent  random  variables,  each  with  density  f  . 

ASSUMPTION  (Ag)  :  The  cdf  G  has  a  density  g  =  G'  and  g(y)  is 
continuous  at  y  =  0  with  g(0)  >  0  . 

ASSUMPTION  (A^)  :  Assume  the  error  density  f  is  absolutely  continuous 
and  /(f'/f)^f  dx  <  ®. 


/ 
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The  following  remarks  are  from  Theorems  4.1,  5.1  and  6.1  of  Sievers 
(1983)  with  some  change  in  notation. 

REMARK  2.1  .  Let  (A^)  -  (A^)  hold.  Let  A  be  a  fixed  p  x  1  vector. 
Then  as  n  -*•  °°  , 

U(0) I  N(yCA,  (1/12)V)  , 

A/  Yn 

2 

where  y  =  (/f  )  • 

In  the  notation  "j  "  ,  the  3  specifies  the  parameter  vector  in 

P 

model  (1.1)  . 

Let  c  >  0  be  given  and  define  a  set  V-  {A:  -c^A^^c,  k=l,...,p}. 
Let  ||*||  be  the  Euclidean  norm. 

REMARK  2.2  .  Let  (A  )  -  (Ag)  hold.  Then  as  n  -  °°  , 

sup  |  |  U(A/Yn)  -  U(0)  +  yCA|  |  0  . 

AeP  “  0 

REMARK  2 ■  3  .  Let  (A^)  -  (Ag)  hold.  Then  as  n  -*•  °°  , 

*^v( 8  -  S )  |  g  ^  N(0,  (1/12y2)C_1  V  C~l)  . 

A  test  of  the  hypothesis  concerning  the  full  parameter  vector, 

Hq  :  3=0,  can  be  based  on  a  quadratic  form  in  U(0)  or  in  3  by 

using  the  large  sample  results  in  the  preceding  remarks.  The  former 
has  the  advantage  of  not  requiring  an  estimate  of  y  • 


3.  TESTING  A  REDUCED  MODEL 


In  this  section  the  problem  of  testing  a  reduced  model  is  discussed. 
With  the  full  rank  assumption  on  the  design  matrix,  the  problem  will  be 
expressed  in  terms  of  testing  to  drop  some  of  the  terms  from  the  model. 

A  test  based  on  the  rank  estimate  will  be  discussed  here. 

Consider  the  partitioning  X  *  (X^.X^  and  3  =  Cfij.Bp'  where 
Xj  is  n  x  p1  ,  X2  is  n  x  p2  ,  ^  is  p  x  1  ,  62  is  P2  -X  1  and 
P^  +  p0  =  p  -  The  model  (1.1)  can  then  be  written  as 

(3-1)  Y  =  8Q1  +  X  6j_  +  X2§2  +  e  . 

The  reduced  model  hypothesis  to  he  considered  is  H  :  S2  =  0  . 

Some  further  notation  will  be  needed.  Let 


c  = 

f  ?n  -i2  1 

1 

,  V  = 

r  v  v  ) 
-11  -12 

' 

921  922  . 

‘  / 

V  V 

-21  -22 

where  and  V  are 


e  -  <-c21c-j,  i„  > ,  c3 


2-1 


p^  x  p^  ,  and  let 
-22  ”  -21 'll '12  * 


A  natural  test  of  the  hypothesis  can  be  based  on  a  quadratic 

form  in  the  estimate  of  32  .  The  estimate  minimizing  (2.1)  can  be 
partitioned  8  =  (S^.S^)1  and  Remark  2.3  implies  that 

Ml2  "  S2)  |g  -N(g,  (1/12Y2)L*)  , 
where  S*  is  the  lower  right  p2  x  p,  submatrix  of  C~1VC' 


It  can 


(also  see  Remark  4.2). 


4.  DROP  IN  DISPERSION  TEST 


McKean  and  Hettmansperger  (1976,  1978b)  proposed  a  test  of  Hq 
based  on  the  drop  in  a  rank  dispersion  function  between  the  full  model 
and  the  reduced  model.  This  data  fitting  criterion  is  appealing  and 
is  directly  analogous  to  the  least-squares  test  statistic.  In  this 
section  the  drop  in  dispersion  statistic  is  considered  for  the  weighted 
dispersion  function  (2.1). 

Under  Hq  ,  the  reduced  model  of  (3.1)  contains  the  parameter  8^ 
Let  61d  denote  the  reduced  model  estimate  of  8.  obtained  by 
minimizing  the  dispersion  function  D(b^,0)  of  (2.1)  with  respect  to 
the  p^  variables  in  b^  .  As  before,  6  minimizes  the  dispersion 
function  for  the  full  model.  The  drop  in  dispersion  test  statistic  is 
given  by 

S2  =  (12y/n)[D(61R,0)  -  D(B) 1  , 

where  y  is  a  consistent  estimate  of  y  . 

The  following  remark  concerns  the  asymptotic  distribution  of  S2 
for  testing  purposes.  The  proof  will  be  delayed  to  the  Appendix. 

REMARK  4.1  .  Let  ( A )  -  (Ag)  hold.  Then  for  8  -  (0,A2/^)  ,  as 
n  -*•  “  , 

s2  -  12U(0),G’C22>1GU(0)  ■£.  0. 

Using  Remark  2.1  when  8  =  (0,A2/*rT)  for  the  distribution  of  U(0)  , 
it  follows  that  S2  will  be  asymptotically  chi-square  if  GVG'  *  C22.^ 


-12- 


Note  that  the  distribution  of  is  free  of  since 

(?1’~2)  =  S 

The  following  remark  summarizes. 


(0.62) 


by  a  translation  property  of  B  and 


REMARK  4.2  .  Let  (A^  -  (Ag)  hold.  If  GVG'  =  C22>1  ,  then  as 
n  -*•  «>  , 


( 9  ^2/ ^ 


X  ^2 *  ^2^ 


where  62  =  12Y2^2?22* 1~2 
The  distribution  of 
when  Hq  holds  (a2  =  0) 

alternatives  g2  =  A 2//n. 

2 

reject  Hn  if  S0  >  v 

0  2  \jt,p0 


S2  is  asymptotically  central  chi-square 
and  noncentral  chi-square  under  local 
The  test  of  approximate  level  a  is  to 


• 
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5.  AN  ALIGNED  RANK  TEST 


Rank  tests  based  on  aligned  ranks  have  been  discussed  by  many 
authors  for  linear  model  problems,  see  Lehmann  (1963),  Adichie  (1978), 
Sen  and  Puri  (1977).  The  basic  principle  in  aligning  the  observations 
is  to  estimate  the  nuissance  parameters  and  to  test  the  remaining 
parameters  with  a  suitable  statistic  as  if  there  were  no  nuissance 
parameters  present.  The  resulting  test  typically  has  good  large  sample 
properties.  Small  sample  results  are  difficult  to  obtain  in  general. 

In  the  present  context  the  aligned  rank  method  requires  a  reduced 
model  estimate  of  8^  and  a  statistic  to  measure  the  relationship  of 
the  reduced  model  residuals  to  X2  .  Let  8^R  be  the  reduced  model 
estimate  of  8^  of  the  previous  section.  Let 


U  -  U(B1R,0)  -  (l/2)n  3/2X’B  , 

where  B  =  (B^)  using  (2.2),  (2.6)  and  Bi  =  ^jWijS®n^Zi  “  Zj )  >  where 
Z  =  Y  -  X,  B.r,  =  (Z. )  is  the  vector  of  reduced  model  residuals.  Also 


form  the  partition 


u„ 

,,  -3/2 

*  (l/2)n 

51®] 

XlB 

-2) 

i 

~2~j 

y2  is  P2  X  1  . 

The  aligned  rank  test  statistic  will  be  a  quadratic  form  -in  . 
The  following  remark  gives  the  necessary  details.  The  proof  is  delayed 


to  the  Appendix. 
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REMARK  5.1.  Let  (A^)  -  (Ag)  hold.  Then  as  n  -*■  <*> 


Uj  -=*  N(yC22<1A2,  (1/12 )GVG')  . 

~  1  (61 ,  A2/*^T) 


With  this  result  it  is  easy  to  see  that  the  appropriate  quadratic 
form  is 


S3  =  12U^ (GVG* )_1U2  • 


The  following  remark  is  immediate  from  Remark  5.1. 


REMARK  5.2  .  Let  (A^)  -  (Ag)  hold.  Then  as  n  °°  , 

SJ  X2(P?,<5J  , 

J|(61,A2/^ 

2  -1 

where  the  noncentrality  parameter  63  =  12y  A2C22. i (GVG’ )  c22*iA2  * 

This  is  the  same  noncentrality  parameter  as  in  Remark  3.1  and  if  (3.2) 
holds  it  equals  the  noncentrality  parameter  of  Remark  4.2  . 

The  distribution  of  S3  is  asymptotically  central  chi-square 
under  Hq(A2=0)  and  noncentral  chi-square  under  local  alternatives 
62  =  A2/mi  .  The  test  of  approximate  level  a  is  to  reject  Hq  if 

s  >  x2 
3  *a,p„ 


I 
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6.  CHOICE  OF  WEIGHTS 

In  the  unweighted  case,  w^  =  1  ,  the  asymptotic  covariance 
matrix  of  the  estimate  0  is  a  constant  multiple  of 

(6.1)  C"LV  C"1  =  I"1  . 

As  noted  in  Sievers  (1983),  this  is  a  case  of  highest  efficiency  and 
the  use  of  weights  cannot  improve  on  this.  The  finite  sample  size 
version  of  (6.1)  is 

(6.2)  (X,WX)"1X'WWX(X’WX)"1  =  (X'X)'1 

and  when  this  condition  holds  there  will  be  no  loss  of  efficiency  in 
using  weights.  A  weight  matrix  will  satisfy  (6.2)  if  and  only  if  there 
is  a  nonsingular  matrix  H  such  that 

(6.3)  WX  =  XH  . 

A  goal  in  selecting  weights  would  then  be  to  satisfy  (6.2)  or  (6.3)  . 

In  the  remainder  of  this  section  and  in  the  next  section  some  situations 
will  be  discussed  where  it  is  possible  to  select  weights  to  retain 
efficiency  and  gain  in  other  aspects. 

The  introduction  indicated  several  one-way  and  higher  order 
analysis  of  variance  problems  where  restricted  rankings  are  used.  Such 
restricted  comparison  methods  can  be  handled  in  the  present  context  by 
using  weights  w^  *  1  if  the  ith  and  jth  observations  are  to  be 
compared  and  zero  otherwise.  Examples  in  the  next  section  indicate 
there  need  not  be  a  loss  of  efficiency.  Quade  (1979b)  and  Silva  and 
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Quade  (1980)  explored  the  use  of  weights  proportional  to  a  measure  of 
within  block  variability  for  complete  block  designs. 

In  analysis  of  variance  problems  arising  in  practice  a  frequently 
occurring  difficulty  is  that  the  assumptions  for  the  basic  additive 
model  are  in  doubt.  In  such  situations  the  use  of  restricted  comparisons 
can  be  helpful.  The  additive  model  formally  allows  only  shifts  in 
location  between  groups  but  in  practical  applications  there  is  often  a 
drift  from  this  that  increases  as  the  groups  are  further  apart. 
Neighboring  groups  in  a  design  can  be  reasonably  close  in  variation  and 
shape  of  distribution  due  to  similar  experimental  influences  but  this 
may  deteriorate  for  groups  that  are  more  distant.  The  treatment  may  be 
affecting  more  than  just  the  location  of  a  distribution.  Transformations 
can  be  tried  to  diminish  this  type  of  effect  but  they  are  not  always 
successful  and  they  introduce  problems  in  interpreting  the  results.  In 
these  situations  the  comparisons  between  distant  groups  can  be  inappro¬ 
priate.  By  comparing  only  neighboring  groups  the  effects  of  group 
differences  not  formally  specified  in  the  model  can  be  diminished.  The 
focus  is  directly  on  the  simple  effects. 

The  specification  of  neighboring  groups  may  be  uncertain  in  a  given 
problem  but  for  simplicity  it  should  be  enough  to  compare  observations 
only  when  they  are  in  immediately  adjacent  groups  in  the  design.  This 
approach  is  discussed  more  in  the  examples  of  the  next  section.  Modifi¬ 
cations  could  be  made  to  compare  groups  that  differ  by  more  than  one 
level.  The  idea  here  is  directly  analogous  to  the  comparisons  of 
"relevant  pairs"  as  proposed  by  Quade  (1979a)  for  a  multiple  regression 


problem.  He  also  suggested  restricted  comparisons  for  factorial  designs. 
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EXAMPLES 


Consider  an  analysis  of  variance  model  where  data  is  available  in 
p  +  1  groups,  labeled  Gq  ,  ,  .  .  .  ,  G  ,  and  the  number  of  observa¬ 


tions  in  the  groups  are  denoted  n^  ,  n^  ,  . 


respectively.  Let 


n0  +  nl  + 


+  n 


Suppose  parameters  are  defined  so  that 


y^  ■  Bq  +  +  e^  if  the  ith  observation  is  in  group  .  This  is 

a  convenient  representation,  similar  to  thi  means  model  except  that  G^ 
is  used  as  a  reference  group.  The  work  of  this  paper  is  basically 
invariant  with  respect  to  reparametrizations  and  results  from  this  simple 
model  will  carry  over  to  other  choices  for  defining  parameters.  Thus  one¬ 
way  and  higher  order  factorial  designs,  block  designs,  etc.  can  be  dis¬ 
cussed  in  this  framework. 

With  £'  -  (3^,...,6  )  ,  the  n  x  p  design  matrix  is 


(7.1) 


0  0  ...  0 
lx  .0  ...  0 

0  12  .  .  .  0 

0  0  ...  1 
-  -  -p 


where  1^  is  an  n^  x  1  vector  of  "ones",  1  j  ^  p  ,  and  0 
represents  a  vector  of  "zeros"  of  appropriate  length.  This  design  matrix 
X  should  be  centered  to  match  the  previous  use  of  this  symbol. 

Let  the  weights  depend  only  on  group  membership  with  b^  *  b^  the 
weight  for  comparing  an  observation  in  G  with  an  observation  in  G^  . 


{ 


I 
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The  resulting  weight  matrix  is 


coio 

“b « .  ,  •  •  •  J<* 

01-01  Op -Op 

(7.2) 

W  - 

“b10-10 

• 

c . I,  •  i  *  “b -  J . 

1-1  lp-lp 

9 

• 

• 

bpO-pO 

-b  ,  J  «  #  •  •  c  X 
pl-pl  p-p  J 

where 

Ij  is  an  n^  x  n^  identity  matrix,  0  <_  j  _<  p 

*  Jkl 

“k  x  n 

^  matrix  of  "ones"  and 

*  y?  n  b ..  n,  ,  0  <  j 
ik=0  jk  k  -  J 

1  P  • 

Then 

"b01nl-0 

"b02n2io  *  *  ‘  Vp-0 

clil 

"b12n2-l 

WX  - 

~b2inih 

c2~2 

• 

-b  .n  1 
pi  1-p 

-b  _n0l  •  •  •  cl 

p2  2~p  P-P  J 

and 
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(7.3) 


X'WX 


Clnl  _b12nln2 


■b21n2nl 


-b  ,n  n, 
pi  p  1 


C2n2 


-b,  n.n 
lp  1  p 

-b-  n_n 
2p  2  p 


c  n 
P  P  ) 


A  matrix  H  satisfying  (6.3)  is 


ci  +  boini 

^b02  ~ b12^n2  "  ‘  ‘ 

<b0p 

(7.4) 

H  = 

(b0l‘b21)nl 

• 

c2  +  b02n2 

(b0p 

• 

.  (boi'bPi)ni 

(b02  “  bp2)n2 

c 

p 

and  it  remains  to  check  if  this  matrix  is  nonsingular  to  verify  no  loss 
in  efficiency.  By  premultiplying  (6.3)  by  X'  it  is  equivalent,  and 
sometimes  easier,  to  verify  that  X’WX  is  nonsingular. 

In  this  context  the  vector  U(b)  of  (2.6)  can  be  expressed  more 
directly  in  terms  of  the  comparisons  between  groups.  Direct  multiplica¬ 
tion  shows  that  the  k—  element  of  U(b)  is 

(7.5) 


Uk(b)  -  (l/2)n'3/2  bjkTJk, 


where  *  2(#G^  <  G^)  -  n^n^  and  (^G^  < G^)  is  a  Mann-Whitney 

statistic  comparing  groups  G^  and  G^  . 


Several  Treatments  Versus  a  Control  Suppose  G^  is  a  control  group. 


With  group  weights  b^  ■  1  for  j  *  1  ,  .  .  .  ,  p  and  zero  otherwise, 
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the  treatment  groups  are  compared  to  the  control  group  but  not  to  each 
other.  The  matrix  H  of  (7.4)  is  nonsingular  and  there  is  no  loss  of 
efficiency.  Remarks  2.1  and  2.3  along  with  (7.1)  —  (7.4)  yield  familiar 
results  for  this  problem.  Unequal  sample  sizes  are  readily  handled.  If 
covariates  are  present  in  the  problem  the  design  matrix  can  be  modified 
accordingly  and  the  tests  of  sections  4  or  5  can  be  used. 


One-Way  Analysis  of  Variance  Suppose  G  ,  .  .  .  ,  G  represent  p  +  1 


groups  to  be  compared.  In  the  unweighted  case,  b.,  =  1  ,  the 

3  k 


quadratic  form  in  U(0)  based  on  Remark  2.1  yields  the  familiar 
Kruskal-Wallis  test.  As  indicated  in  section  6,  in  some  circumstances 
the  use  of  restricted  comparisons  may  be  beneficial. 

Suppose  there  is  a  natural  order  in  the  treatment  groups;  for 
instance  the  groups  may  correspond  to  increasing  dose  level,  to 
increasing  age  of  subjects  or  to  geographic  locations  along  a  path. 

The  discussion  in  section  6  suggests  that  the  use  of  adjacent  comparisons 


can  be  beneficial  in  such  cases.  Let  b..=l  if  j=k~l  for 


k  »  1  ,  .  .  .  ,  p  and  zero  otherwise.  Then  observations  are  compared  only 
to  observations  in  an  immediately  adjacent  group.  An  examination  of 
(7.3)  shows  it  is  nonsingular,  so  H  is  nonsingular  and  there  is  no 
loss  of  efficiency.  Thus  the  use  of  adjacent  comparisons  can  be  a  useful 
alternative  to  the  unweighted  case,  especially  in  the  presence  of  model 
deff iciencies. 

Consider  yet  another  pattern  of  weights  for  this  problem.  Let  the 


weight  of  a  comparison  between  groups  G^  and  G^  be  given  by 


b^  *  *  1/n^n^  .  The  effect  is  to  give  less  weight  to  observations 


t 
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in  groups  with  larger  sample  sizes.  With  these  weights,  (7.3)  simplifies 
considerably  and  it  is  easy  to  show  H  is  nonsingular.  Thus  there  is  no 
loss  of  efficiency.  Sievers  (1983)  discussed  an  influence  matrix  to 
measure  the  influence  of  the  i—  observation  on  the  estimate  of  B^  . 

-  _i 

This  n  x  p  matrix  for  the  estimate  B  minimizing  (2.1)  is  WX(X'WX) 

The  weights  here  yield  the  same  influence  matrix  as  the  unweighted  case. 
This  follows  by  direct  computation. 


Ordered  Alternatives  In  the  notation  of  the  previous  one-way  layout  the 

ordered  alternative  specifies  0  i  8^  £  .  .  •  <  B^  with  some  strict 

inequality.  Making  all  comparisons  requires  that  G.  and  G  be 

J  k 

compared  for  all  j  <  k  .  Test  statistics  based  on  ranks  and 

J..<k  T^/n^n^  have  been  proposed  for  this  problem,  see  Jonckheere  (1954). 

These  statistics  are  linear  combinations  of  U(0)  as  given  in  (7.5) 

using  suitable  coefficients  and  weights  b.,  .  Tryon  and  Hettmansperger 

J  k 

(1973)  showed  value  in  using  only  adjacent  comparisons  with  a  statistic 

£P  a.  T  .  This  can  be  obtained  from  (7.5)  with  weights  b  =  1 

J  J~lfJ  jk 

if  j  =  k  -  1  and  zero  otherwise. 

By  choosing  appropriate  weights  b  ,  ,  these  linear  combination 

J  k 

statistics  can  have  a  wide  variety  of  coefficients.  There  is  some 
value  in  this  general  view.  Many  familiar  results  on  means,  variances 
and  asymptotic  normality  follow  as  special  cases  of  Remark  2.1.  Unbalanced 
cases  cause  no  special  trouble. 

By  using  the  aligned  rank  method  of  section  5  the  presence  of 
nuissance  parameters  or  covariates  can  be  handled  in  a  straightforward, 
unified  manner.  The  method  can  be  adapted  to  provide  a  suitable  test 
statistic  for  variations  in  the  standard  model.  For  instance,  if  the 


•  { 
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groups  are  observed  in  incomplete  blocks  the  use  of  suitable  0-1  weights 
can  restrict  comparisons  to  the  within  block  comparisons  to  cancel  out 
block  effects  and  then  a  suitable  linear  combination  of  the  resulting 

could  be  chosen  to  test  the  ordered  alternative.  As  another  example 
suppose  the  alternative  hypothesis  only  specifies  a  partial  ordering  on 
the  g  instead  of  a  complete  ordering.  For  nuch  a  case  the  use  of 
0-1  weights  can  restrict  comparisons  to  those  matching  the  alternative 
hypothesis . 

Two-Factor  Analysis  of  Variance  A  2x2  layout  will  be  discussed  for 
simplicity  but  extensions  to  larger  tables  and  higher  dimensions  will  be 
apparent.  The  notation  of  the  beginning  of  this  section  will  be  retained, 
but  with  a  change  of  parameters,  to  avoid  introducing  double  subscripts. 
Label  the  cells  and  the  model  parameters  as  follows: 


Go 

G1 

V 

G3 

V 

V  +  Bj. 

y  +  a1 

y  +  al  +  61  +  Y1 

Thus  is  a  row  effect,  6^  a  column  effect  and  an  interaction 

effect.  The  parameters  here  are  related  to  the  parameters  B  of  the 
beginning  of  this  section  by  a  nonsingular  linear  transformation  and  as 
a  consequence  results  obtained  in  one  case  apply  to  the  other. 

Using  the  earlier  notation  for  sample  sizes,  the  n  x  3  design 


'  0  0  0 

0  1,  0 

X  = 

>1  - 

i2  o  0 

In  In  In 

i.  -3  -3  -3 

matrix  is 
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and  it  should  be  centered  yet  to  conform  to  earlier  notation. 

The  main  weighting  scheme  to  be  considered  here  is  the  scheme  that 
uses  only  within  row/within  column  comparisons,  abbreviated  WR/WC.  In 
this  scheme  groups  are  compared  (nonzero  weights)  if  they  are  in  the 
same  row  or  same  column  while  diagonal  comparisons  are  omitted  with 
zero  weights  (b^  =  b^2  =  0)  •  ln  this  way  comparisons  are  made  between 
observations  that  differ  only  in  the  level  of  one  factor.  This  follows 
the  general  rationale  of  section  6. 

If  nonzero  weights  are  "one1*,  the  WR/WC  weight  matrix  is 


(n2  +  n3)IQ 

“-01 

“-02 

0  ' 

‘-10 

(nl+n4)?l 

0 

~~13 

"^20 

0 

(nl+V?2 

~~23 

0 

~~31 

_~32 

(n  +n  )I 

2  3  ~3  j 

A  straightforward  calculation  shows  that  (6.3)  holds  and  there  is  no 
loss  of  efficiency  with  these  weights. 

The  earlier  work  applied  here  shows  that  the  weights  b^  =  l/n^n^ 
retain  full  efficiency.  This  is  also  the  case  if  these  weights  are  used 
for  the  nonzero  weights  in  the  WR/WC  scheme. 

Consider  the  model  above  with  no  interaction  (y^  =  0)  .  If  all 
comparisons  are  used  in  testing  Hq  :  ci^  =  0  the  column  effect  6^  is 
a  nuisance  parameter  and  the  methods  of  section  4  or  5  would  be  needed. 
However,  if  the  WR/WC  weight  scheme  (7.6)  is  used  the  nuisance  parameter 
cancels  out  and  the  test  statistic  can  avoid  estimating  it.  This  is  a 
desirable  feature  and  it  clearly  extends  to  larger  sized  layouts.  There 
is  no  loss  of  efficiency  if  the  sample  sizes  are  equal.  However,  with 


t 
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unbalanced  cases  there  can  be  a  loss.  Perhaps  (7.6)  can  be  further 
modified  to  resolve  this  problem. 

When  the  two-way  layout  is  larger  in  size  than  2x2  there  is 
another  possible  weighting  scheme  consistent  with  the  remarks  of  section  6. 
In  the  WR/WC  scheme,  instead  of  comparing  a  group  to  all  other  groups  in 
the  same  row  or  column,  compare  each  group  to  only  its  immediate  neighbors 
in  the  same  row  and  column.  This  adjacent  WR/WC  plan  should  prove  to  be 
useful  and  warrants  further  study. 

In  the  model  with  no  interaction  consider  testing  the  hypothesis  of 
no  column  effect,  Hq  :  6^  =  0  ,  with  the  aligned  rank  procedure  of 

section  5.  To  align  the  rows,  the  estimate  of  suggested  in  section  5 

is  a  rank  estimate.  The  literature  is  quite  varied  on  this  point  and 
other  i/n  -  consistent  estimates  have  been  suggested,  for  example, 
differences  of  row  means  or  medians.  Suppose  the  estimate  of  is 

denoted  by  a^  and  the  cells  are  aligned  by  subtracting  a^  from  all 
observations  in  the  second  row.  The  aligned  rank  statistic  depends 

on  a^  and  to  examine  this  relationship  the  derivative  dl^/da^  will  be 
calculated.  It  will  be  shown  that  this  derivative  is  zero  for  certain 
choices  of  weights.  Thus  the  use  of  appropriate  weights  can  reduce  the 
effect  of  alignment  on  the  test  statistic. 

First  note  that  is  the  second  element  of  X'B  and  except  for 

a  constant  is  given  by 


(7.7) 


b01T01  +  b21T21  +  b03T03  +  b23T23 


where  T.,  *  2n.a  (/F.dF,  -  (1/2)]  and  F,  is  the  empirical  cdf  of  the 

J  K.  J  K  J  K  j 

(aligned)  observations  in  group  j  =0,1,  2,3. 
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To  avoid  complications  without  losing  the  main  point,  consider  the 
F  as  continuous  cdfs  with  densities.  Without  loss  of  generality 


assume  the  true  value  of  ct^  is  zero.  Under  Hq  ,  6^  =  0 


Then  write 


Fq(x)  =  F (x)  ,  F^(x)  *  F(x)  ,  F2(x)  =  F(x  +  a^)  and  F^(x)  =  FCx  +  a^)  . 
Substituting  these  into  (7.7),  differentiating  and  evaluating  at  a^  =  0 
yields 


dlVdal  =  2(nin2bi2  "  n0n3b03^'ff 

ar° 


This  derivative  is  zero  when  b^2  =  b^  =  0  ,  that  is,  in  the  case  of 
WR/WC  comparisons,  it  is  also  zero  when  b  ^  =  1/n^n^  .  In  these  two 
cases,  at  least,  the  rate  of  change  of  U2  with  respect  to  the  aligning 
quantity  a^  is  zero. 


Block  Designs  The  weighting  schemes  can  be  used  with  block  designs.  Some 
care  must  be  taken,  however,  since  the  asymptotic  results  here  apply  when 
the  within-cell  sample  sizes  grow  large  and  not,  for  example,  in  a  case 
of  a  fixed  block  size  with  the  number  of  blocks  growing  large.  Assumptions 
(A^)  -  (A^)  may  not  hold.  The  asymptotic  results  here  can  also  apply  when 
a  given  design  is  replicated  and  the  number  of  replications  grows  large. 

Consider  a  case  where  a  basic  design  with  m  observations  per  cell, 
m  large,  is  replicated  with  replications  corresponding  to  blocks.  By 
using  zero  weights  the  between  block  comparisons  can  be  eliminated  and 
block  effects  would  cancel  out.  Tests  of  hypotheses  about  the  parameters 
in  the  basic  design  (or  about  a  subset  of  them)  can  be  constructed  from 
the  general  results  here.  Covariates  can  be  handled  in  the  same  framework. 
Unbalanced  sample  sizes  cause  no  special  problem. 
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8.  APPENDIX 

Proof  of  Remark  4.1  .  Some  additional  notation  is  needed.  Let 

it  *  *  ft 

D  (A)  =  (l/n)D(A/»^n)  •  Then  A  =  i^nB  minimizes  D  (A)  .  Similarly, 

*  if 

h  =  minimizes  D  (A, ,0)  .  Define  a  quadratic  function 

-IK  -  IK  —  -L  - 

*  *  -1  -1 

Q(A)  =  yA'CA  -  2A'U(0)  +  D  (0)  .  Then  A  =  y  C  U(0)  minimizes 
Q(A)  and  A^R  =  y  1^(0)  minimizes  QfA^.O)  ,  where  U^(0)  is 

the  first  p^  elements  of  U(0)  . 

It  is  sufficient  to  use  y  in  place  of  y  in  S^  .  Then 

S2  =  (12y)[D*(A1R,0)  -  D* (A) ] 

=  (12y){[D*(A1R,0)  -  Q(A1r,0)]  +  [Q(A1R,0)  -  0(A.*r,0)] 

+  [Q(A*R,0)  -  Q(A*) ]  +  [Q(A*)  -  Q(A) ] 

+  [Q ( A)  -  D*(A) ] }  . 

Now  Theorem  6.1  and  Lemma  6.4  of  Sievers  (1983)  can  be  used  to  show  that 

if  8=0,  the  terms  above,  except  the  middle  one,  converge  in 

probability  to  zero.  This  can  be  extended  to  the  case  8  =  (C^A^^n)  by 

contiguity.  But  the  middle  term  above  is  12[U(0)'C  ^U(O)  -  U^(0) 'C^Ju^(O) ]  = 

12U(0)!G'C„i  ,GU(0)  and  the  result  follows. 

-  -  ~  ~zz • I~ -  - 


Proof  of  Remark  5.1  .  By  a  translation  property  of  the  reduced  model 

estimate  S1D  is  is  enough  to  prove  the  result  when  8,  =  0  since 
-IK  -  i-  - 

A 

the  distribution  of  U2  is  unaffected  by  the  value  of  8^  . 

From  Remark  2.3  it  follows  that  A  =  >'ri81D  has  a  limiting  normal 

-1  -1R  Q 

distribution  and  is  0  (1)  .  This,  with  Remark  2.2,  implies 

P 


I 
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-  -1  P 

U(81B,0)  -  0(0)  +  yC  nA  -  0  . 

-  -1R  -  -  -  -(Vj  o 

Using  a  contiguity  argument, 

fr 

~  _H  *  P 

U  -  U(0)  +  y  _  A.  -*■  o  . 

'  *  '  -21  (0,A2/^) 

This  result  continues  to  hold  if  the  fixed  matrix  G  is  multiplied  on 
the  left  and  this  drops  out  the  term  containing  A^  .  Thus 

GU  -  GU (0)  -£*•  0  . 

(o,a2/^> 

But  by  Remark  2.1, 

GU(0)  -k  N<yC„  -  (1/12)GVG’) 

~~  '  (q,a2/^)  ir2 

and  so  GU  has  this  same  limiting  distribution.  The  proof  is  finished  by 
noting  that  GU  =  U2  -  =  U2  since  U^  =  0  ,  being  the  derivative 

(essentially)  of  the  reduced  model  dispersion  function  evaluated  at  the 


reduced  model  estimate. 
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